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Background/Context: Large-scale randomized controlled experiments conducted in authen- 
tic learning environments are commonly high stakes, carrying extensive costs and requiring 
lengthy commitments for all-or-nothing results amidst many potential obstacles. Educational 
technologies harbor an untapped potential to provide researchers with access to extensive and 
diverse subject pools of students interacting with educational materials in authentic ways. These 
systems log extensive data on student performance that can be used to identify and leverage best 
practices in education and guide systemic policy change. Tomorrow’s educational technologies 
should be bualt upon rigorous standards set forth by the research revolution budding today. 


Purpose/Objective/Research Question/Focus of Study: The present work serves as a call to 
the community to infuse popular learning platforms with the capacity to support collaborative 
research at scale. 


Research Design: This article defines how educational technologies can be leveraged for use in 
collaborative research environments by highlighting the research revolution of ASSISTments 
(www.ASSISTments.org), a popular online learning platform with a focus on mathematics 
education. A framework described as the cycle of perpetual evolution is presented, and research 
exemplifying progression through this framework is discussed in support of the many benefits 
that stem from infusing EdTech with collaborative research. Through a recent NSF grant 
(SI2-SSE@SSI: 1440753), researchers from around the world can leverage ASSISTments’ 
content and user population by designing and implementing randomized controlled experi- 
ments within the ASSISTments TestBed (www.ASSISTmentsTestBed.org). Findings from 
these studies help to define best practices within technology-driven learning, while simultane- 
ously allowing for augmentation of the system’s content, delivery, and infrastructure. 
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Conclusions/Recommendations: Supplementing educational technologies with environ- 
ments for sound, collaborative science can result in a broad range of benefits for students, 
researchers, platforms, and educational practice and policy. This article outlines the success- 
ful uptake of research efforts by ASSIST ments in hopes of advocating a research revolution for 
other educational technologies. 


INTRODUCTION 


Educational psychologists, researchers, and practitioners have grown ac- 
customed to the complex and time-consuming nature of studying effec- 
tive classroom practices. When studying learning interventions, seasoned 
experts turn to the gold standard in determining causality: the random- 
ized controlled experiment (RCE). Yet despite a recent call encourag- 
ing the use of RCEs within authentic learning environments (Institute of 
Education Sciences [IES], 2013), and despite the nearly infinite array of 
complexities to be examined within the context of instruction (Koedinger, 
Booth, & Klahr, 2013), RCEs can be difficult to conduct in real-world class- 
rooms (National Research Council, 2002). Common complications in- 
clude IRB restrictions, lengthy and invasive pre- and post-tests, curriculum 
restrictions for the design of strict controls, and large sample populations 
required to detect significantly reliable results. Further, experimental de- 
signs must be carefully vetted prior to implementation in an attempt to ac- 
count for as much variance as possible. Thorough organization is also nec- 
essary when recording and maintaining anonymized student data. With so 
many moving parts, traditional classroom RCEs leave numerous windows 
for error and bias. Even when reporting findings, publication bias and 
the cherry-picking of results can lead to non-replicability, contributing 
to a growing crisis of faith in RCEs spanning numerous scientific fields 
(Achenbach, 2015; Ioannidis, 2005; Open Science Collaboration, 2015). 
Additionally, while a handful of traditional classroom RCEs have led to 
significant implications for educational practice and policy, most lack the 
statistical power necessary to observe reliable improvements in student 
achievement because they are restricted by class- or school-level random- 
ization (i.e., all students within a particular class or school fall within the 
same experimental condition, resulting in drastically reduced sample 
sizes). High-stakes explorations at scale (e.g., stressful make-or-break lon- 
gitudinal studies costing millions of dollars) often include thousands of 
students and span multiple years but still fall short of identifying learning 
interventions that reliably enhance student achievement. 

While it is crucial that high standards exist for educational research, the 
present work investigates the use of educational technologies to simplify 
the process of conducting RCEs within authentic learning environments, 
making research at scale more feasible and more accessible to researchers. 
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Infusing popular learning platforms with the capacity to support collab- 
orative research environments has the potential to lower the stakes by 
drastically reducing costs, promoting validated universal measures of 
achievement, and assisting researchers through the process of designing, 
implementing, and analyzing RCEs conducted at scale within real-world 
classrooms. Supplementing educational technologies with environments 
for sound, collaborative science can result in a broad range of benefits for 
students, researchers, platforms, and educational practice and policy. 


THE GROWTH OF EDUCATIONAL TECHNOLOGIES 


Educational technologies offer the novel opportunity to drive best prac- 
tices in K-12 education by testing what works in authentic learning envi- 
ronments while simultaneously simplifying the process of educational re- 
search. Technology is gaining acceptance in the modern classroom, with 
intelligent tutoring systems (ITS), computer-aided testing platforms, and 
adaptive learning applications offering new and unique approaches to 
learning, heralding a transition from teaching-based practices to learn- 
ing-based practices (Bush & Mott, 2009), and producing exponential 
growth in the availability of educational data. Educational technologies 
commonly include immediate feedback, adaptive assistance, elements 
that enhance student motivation and engagement, and assessment tools 
for teachers and administrators that help to drive data-driven classroom 
practices. Therefore, the National Education Technology Plan predicted 
that these platforms would play a key role in personalizing educational 
interventions (U.S. Department of Education, 2010). However, less fo- 
cus has been devoted to one of the primary forces driving successful 
personalization: the use of adaptive learning technologies to conduct 
educational research. 

These platforms and applications already have great promise for ex- 
tending the accessibility of educational materials and improving learning 
outcomes across diverse populations. At scale, the data collected from 
these technologies can be leveraged in dynamic ways that may reveal revo- 
lutionary insights about learning. Entire fields of research are growing 
alongside educational technologies in hopes of better understanding how 
these tools and their data can be used to improve education (e.g., learn- 
ing analytics and educational data mining). However, despite significant 
growth in researcher interest, few platforms currently available to teach- 
ers and students allow for real-time hypothesis testing. In lieu of in vivo 
experimentation, researchers often turn to logged data to model student 
performance, make predictions regarding learning, and determine the 
effectiveness of system features (Koedinger, Baker, et al., 2010). “Big Data” 
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in education has grown synonymous with solutions that enhance educa- 
tional practices, platforms, and theories. Still, a critical link is missing: 
causality. Examining the causal effects of specific learning interventions 
through “Big Experimentation” would allow researchers to begin answer- 
ing three questions to truly drive personalized education: What works 
best? For whom? When? By determining the interventions that work best 
for particular students and the optimal time to deliver those interventions, 
controlled experimentation conducted within these platforms has the po- 
tential to revolutionize the future of education. 


THE ASSISTMENTS PLATFORM 


Despite expanse in the availability of adaptive learning technologies in re- 
cent years, popular platforms have been very slow to mobilize, support, and 
leverage randomized controlled experimentation (Williams, Maldonado, et 
al., 2015; Williams, Ostrow, et al., 2015). ASSISTments is an online learn- 
ing platform that was designed with the flexibility to house RCEs and has 
supported the publication of more than two dozen peer-reviewed articles 
on learning since its inception in 2002 (Heffernan & Heffernan, 2014). 
The platform, offered as a free service of Worcester Polytechnic Institute 
(WPI), is an increasingly powerful tool that provides students with assis- 
tance while offering teachers assessment. Over $14 million in grant funding 
from the IES and the NSF has supported twelve years of co-development 
with teachers and researchers to establish a unique tool for educational re- 
search at scale. Historically, the primary investigators of these studies have 
had close connections to WPI (e.g., graduate students or other researchers 
working closely with the ASSISTments Team). However, a recent NSF grant 
(Heffernan & Williams, 2014) has helped to launch a formal infrastructure 
that allows external researchers to use ASSISTments as a shared scientific 
tool. This supplementary infrastructure is called the ASSISTments TestBed 
(www.ASSISTmentsTestBed.org). While other systems have the potential to 
provide many of the same classroom benefits as ASSISTments, none pro- 
mote an infrastructure allowing educational researchers to design and im- 
plement content-based experimentation and to do so with ease. 

Doubling its user population each year for almost a decade, ASSISTments 
is used by hundreds of teachers and over 50,000 students around the 
world, with over 10 million problems solved in the 2013-2014 school year. 
Although most content pertains to middle school mathematics, teachers 
from alternative domains such as history, biology, and statistics have also 
built material to harness the powers of the platform in their own class- 
rooms. Content is built at the problem level, as shown in Figure 1. The 
problem builder allows teachers and researchers to design questions and 
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tutorial strategies using a simple interface that allows for the inclusion of 
text, graphics, and hypermedia elements. The builder is unique in that it 
allows for efficient content design without extensive knowledge of com- 
puter programming. Questions can then be combined to form problem 
sets for assignment to students. Teachers commonly use ASSISTments to 
assign classwork and homework with immediate feedback and rich tutor- 
ing, but they can also turn off feedback elements to assign content as a test 
or quiz. Use of ASSISTments has been shown to reliably improve students’ 
learning in comparison to traditional paper-and-pencil approaches (Kelly, 
Heffernan, Heffernan, et al., 2013; Koedinger, McLaughlin, & Heffernan, 
2010; Mendicino, Razzaq, & Heffernan, 2009; Miller, Zheng, Means, & 
Van Brunt, 2013; Singh et al., 2011; Soffer et al., 2014). Most recently, 
SRI International reported the results of an efficacy trial of ASSISTments, 
showing that the platform caused large, reliable learning gains on stan- 
dardized assessments (Rochelle, Feng, Murphy, & Mason, 2016). 

In addition to building content, teachers and researchers are able to ac- 
cess an extensive library of prebuilt content and textbook material. Full 
problem content is available for more than 20 of the top seventh-grade 
mathematics texts in the United States, delivered without infringing on 
copyright. Teachers can select from prebuilt problem sets or use and alter 
copies of content to develop their own problem sets. There are two primary 
types of problem sets within ASSISTments. A linear problem set has a prede- 
termined number of problems, and the assignment is considered complete 
when the student has finished all problems, whether or not the answers are 
accurate. Alternatively, in a skill builder problem set, students must solve 
problems selected at random from a skill pool until reaching a predeter- 
mined threshold of mastery (e.g., answering three consecutive questions 
accurately on first attempts). Although the system default is three problems, 
mastery can be redefined to include any number of consecutive accurate 
problems. In both types of problem sets, assistance can vary to include cor- 
rectness feedback, tutoring specific to particular problems, or worked ex- 
amples depicting solutions to isomorphic problems. Tutoring strategies in- 
clude hint messages, scaffolding problems (used to break a problem down 
into steps), and mistake messages (feedback tailored to common wrong 
answers). Hints, scaffolds, and mistake messages are compared in Figure 
2. If researchers do not wish to design their own content, over 300 certified 
skill builders tailored by the ASSISTments team to the Common Core state 
standards for mathematics (National Governors Association Center for Best 
Practices & Council of Chief State School Officers, 2010) can be manipu- 
lated to incorporate experimental modifications. 
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Figure 1. Example of a problem viewed within the builder: Notice that 
the interface allows creation of the problem itself, answers (both correct 
and incorrect), and tutoring strategies, and the navigation menu in the 
top right corner allows the user to navigate from editing a main problem 
to editing feedback. 


PRA4XBW - Stem and Leaf - Mean exit name Main Problem 1 
Details View Problem Test Drive New Copy Hint1 Hine 
Problem Type: ( Standard problem) then a og 


Parent: Problem 764022 
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Tag Skills to Problem 


Fl New Main Problem 


| rotsws + B J UY S| E A~ M~ 
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The following stem and leaf plot shows the number of shoes sold each week 
at a store. According to this plot, what is the mean of shoes sold each week? 


Shoes Sold Each Week 


(Round to the nearest hundredths place) 


(7/save Problem Body 


Answers What's this? 
33.5 Edit Delete 


(3) New Answer 


Tutoring Strategies what's this? 


Hintt Hint 


(5) New Strategy 


TCR, 119, 030306 Tomorrow’s EdTech Today 


£40 | July Moys 


398.09 JOU SI ,9¢, :uleBe Ay ‘Aios 


o” ——— 


% 


3(uojssasdxa JoryDWaYz0W) 
Mo}aq JaMsuD NOK adh 


vey €2 4b 

vovvey 

SE=B+EL+ H+ Lb 

“pappe nok sway} Jo saquinu 

4p Aq @pIAIp uayy PUR Wns a4) PUly SUL) ‘@BEIAAR 94) PUL OL 
S9E=S+El+ Heh 

8 ‘chy ‘Lb 

“JOVURAY 

4) SuIpULy Jo Peaysu! W/S 24} PuNoy arey No swaas 3] 


8 ‘Eb ‘y ‘b 
*MO}aq SJAQWNU BY} JO (ABeJaAe) URAL BY} PULY 
D4S6VYd 0] Wa}qord 


JOMSUY HUGS 


:(uojssasdxa j02,70WaYyIDW) MO}a JamMsUD INOK adky 


“Mo}aq XOq 4) U} NS a4) 4933 


8 ‘EL ‘b ‘LL :we)qod uno uy suaquinu 
34} We JO WNS AY} PULJ WIN} JNOA $3) MON 


O9= FOL +Sh+6rOr07 

“UaAIS auam R43 SYaquuNY ayy dn wins 03 51 das guy Su 
“gue auayy Suaquunu AueWw 

oy Aq aplalp pue ssaquinu aya qe ppe ‘UeaW ay} PUly OL, 
£40} Gaas 

9 ‘Ob ‘St ‘6 ‘O ‘Oz 

“mo}aq suaquinu ayy 40 (aBesane) ue|w ay) uly 
twa}gosg a}duexy 


“wa]qosdg ajdwexg Je})Wis e 
7 YOO) $337 “djay awos asn pyno> NoA ayI) syoo?] 


Wajqosd s1q7 UO FUaUOD «= SBLHOSL ~ ZNVEVYd *0) Wa/qolg 


JaMsuy 3gns 


i(uopssasdxa jooDWaYzDW) MoOjaq samsuUD INOA adAy 


8 ‘el ‘b ‘bb 
*mo}aq suaquunu ay] Jo (aBevaAe) uRaW ay) puly 
Way Siq7 UO juauNOD ZNVeVed *G) Wa}Qoig 


Jamsuy Wwqns 


0s 
® i(uojssasdxa 
]D2;J0WaYyJOW) Moj}aq JamsuD INOA adh 


SA wo WUD 


"gu adAL 
v 
=~ = ub 
6 ra W 


“py ‘aue asayy suequinu 
Auew Moy Aq ‘9¢ ‘suaquinu ay3 ye yo wns ayy apiaip ‘A)yeuly 


HSA Uo WeUITD 


“ssaquinu p aue asay3 ‘waygosd 
SI4y} JO4 “BARY aM SJaquNU AUR MOY MOUY 0} PaaU aM “]X3N 


WSR wo eu 
9E= 9+ EL ++ LE = wns 
“SJQWINU BY} JO WNS 343 Puls ‘Sut4 


“gue auay] Siaquuinu Auew Moy 
‘Aq wins ay) apiaip pue syaquinu ayy y]e ppe ‘ueaw ayy PUL OL 


UOISIAaI AS9}e.19S UO 


SULIO}N} paTiejop WIM “aMsue Su0IM oYIDads v 07 asUOdSaL UI poprlAOld st asessour syeISTU B “VYSLI oy) UO SuTaTqo1d 
JIE[IULIS B DATOS 0} MOY JO a[duIexa payIOM B WIM plosseos & UaArIs ATeIHeUIOINe seM pue asuOdsa. 4991109UT UL 
pepraoid yuopm)s ay} ‘appr sup Ur YUapnys oy) Aq posonbar sv 4JoT oy) UO UMOYsS aie syUTY VoIY_]T, *yUIIUOT 
wa]qoid sures ay) 0} asuodsai Ul asessour oyeIsSTUI B puUe ‘WaTqold pfoJzeos kv ‘syuTY JO UOSsLIeduIO') *Z BMSL] 


Teachers College Record, 119, 030306 (2017) 


ASSISTments also offers optional features such as the Automatic 
Reassessment and Relearning System (ARRS), which helps to reassess 
student retention following skill builder mastery (Xiong & Beck, 2014), 
and PLACEments, a prerequisite skill-training system that allows teach- 
ers to create skill tests that pinpoint and help to alleviate knowledge gaps 
(Whorton, 2013). When a teacher elects to use ARRS after completing a 
skill builder, students are given a series of single-question reassessment 
tests, scheduled seven, 14, 28, and finally 56 days after the initial learn- 
ing experience to estimate skill retention. If students fail to answer the 
reassessment question accurately, they are provided support to relearn 
the material through a secondary skill builder. Research has shown that 
ARRS significantly enhances longitudinal skill understanding and student 
assessment (Soffer et al., 2014; Wang & Heffernan, 2014). Like ARRS, 
PLACEments is also connected to skill builder content. PLACEments acts 
as a computer adaptive test that taps into a hierarchy of prerequisite skills 
to personalize the remediations a student should receive based on per- 
formance on an initial skill test. Research has shown that PLACEments is 
a useful tool for isolating learning gaps that can also help to strengthen 
curriculum through a stronger understanding of prerequisite skill rela- 
tionships (Adjei & Heffernan, 2015). 

As an assessment tool, ASSISTments offers teachers a myriad of student 
and class reports that allow an expansion of classroom practices through 
actionable data. An example of an item report, the most commonly used 
report, is shown in Figure 3. This report has a column for each problem 
and a row for each student, as well as various summaries of student and 
problem performance. The report can be made anonymous (as shown 
in Figure 3) for teachers to use in the classroom to facilitate discussion. 
This report also allows teachers to pinpoint areas of struggle through com- 
mon wrong answers (errors that were made by at least 10% of students in 
the class). In Figure 3, only 27% of students answered the first problem 
accurately, with 56% of students sharing the common wrong answer of 
1/9410. This offers an opportunity for discussion that may be lost on stu- 
dents grading their own homework using traditional classroom methods. 
Teachers can also work with students to design mistake messages (like that 
shown in Figure 2) for future students who attempt the problem and share 
the same misconception. 

Through NSF funding (Heffernan & Williams, 2014), reports for re- 
searchers have grown far more complex than teacher reports, providing 
numerous formats of raw performance data with rich student-, class-, and 
school-level covariates, as well as a number of automated analyses. Through 
the ASSISTments TestBed, and specifically through the Assessment of 
Learning Infrastructure (ALI), researchers are provided weekly automated 
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Figure 3. Excerpt from an anonymized item report: Students are listed 
in the first column, followed by average performance, and then specific 
performance on each question within the problem set. Teachers can 

see whether the student answered correctly or incorrectly, the response 
given, whether a tutoring strategy was used, and common wrong answers 
as measured across the entire class. Common wrong answers are 
actionable; teachers and students can work together to provide a mistake 
message for future students. 


Student/Problem = PRAHESY PRAHE5Z Satie! 
[Unanonymize] aan e Data driven Data driven riven 
Problem Average 60% 27% 61% 84% 
Common Wrong 1/9*10,56% 1/5%°13,58% 
Answers +feedback +feedback 
Correct Answer(s) 1/3°10 1/5*3 1/16°2 
" x x x 
ROOK * 50% 1/910 1/5*13 1/162 
WOK * 45% ’ 7 
BASAL _ 1/9°10 1/5*3 1/16*°2 
XOX # 55% ¥ “3 ¥ 
7 1/310 1/513 1/16°2 


reports detailing anonymized study participation (Ostrow et al., 2016). 
These reports, as shown in Figure 4, provide basic analyses, including 
bias assessment (examining attrition across experimental conditions) 
and simple hypothesis testing on post-test performance. Researchers 
are also provided a student covariate file, detailing student information 
collected prior to study participation (e.g., prior performance aver- 
age), and four formats of raw data logged by the ASSISTments tutor as 
students work through the assignment. ALI’s reporting and researcher 
communications make the TestBed easier for researchers to use, stream- 
lining research at scale. 
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Figure 4. The Assessment of Learning Infrastructure (ALI) provides 
researchers with logged data from students participating in RCEs within 
the ASSISTments TestBed (Ostrow et al., 2016). This automated report 
is generated weekly and/or at the request of the researcher and presents 
analyses and raw data. Analyses include a chi-squared test comparing the 
observed and expected sample distributions, simple hypothesis testing, 
and an analysis of means in post-test performance. 


The Assessment of Learning Infrastructure (ALI) 


Completion Rates 
Students that have started your study: 329 


Students that have completed your study: 251 


Bias Assessment 

Before analyzing learning outcomes, we suggest first assessing potential bias introduced by your experimental conditions (i.e., 
examine differential attrition). The table below reports the number of students that have completed your study, split out by 
experimental condition. 


Condition Started (n) Completed (n) Completed (%) 
Group A — Experiment 1 109 80 73.39 | 
Group B — Experiment 2 87 60 68.97 
GroupC-Contol | | 889.90 
295 229 


NOTE: A significant difference was found between observed and expected completion rates across conditions, ’ (2, N = 295) = 
13.467, p < .01. This means that a selection effect may have occurred. Hypothesis testing with regard to posttest scores has not been 
conducted out of an abundance of caution. 


Mean and Standard Deviation of Posttest Score by Condition 
To examine learning outcomes at posttest, an analysis of means was conducted across conditions. The table below reports mean 
posttest score and standard deviation for each condition. This information was sourced from our automated posttest sub-report. 


Jo Completed (nm) | Posttest Score* | 
Group A—Experiment1 | 80 34.40(4.34) 
GroupB—Experiment2 | 6 32.95,(3.89) 
Group C — Control 89 44.11 (3.72) 

Total 229 37.15 (3.98) 

* Presented as Mean (SD). 


Raw Data Files 


Raw data files contain the logged information for each student that has participated in your study. We provide this data in a variety of] 
formats, as explained below, to assist in your analytic efforts. We use Google Docs to share these files with you. If you would like to 
process these files manually, we recommend downloading the CSV file of your choice and saving the file as an Excel spreadsheet or 
workbook to retain formatting and formulas. If you will be passing the file directly to a statistical package, downloading the CSV to a 
convenient location should suffice. 


For a field glossary and tutorials on how to read each type of file, visit our Data Glossary. 


Historical Data 

Covariate File - A collection of useful covariates for the students participating in your study. This file includes student level variables 
(i.e., gender), class level variables, (i.e., homework completion rates), and school level variables (i.e., urbanicity). Click here for a 
tutorial on how to link this file to your experimental data. 


Experimental Data 


1. Action Level - One row per action per student; the finest granularity. Students participating in your study have performed 13,655 
actions (e.g., beginning problems, attempting to answer problems, asking for tutoring, and eventually completing problems). 

2. Problem Level - One row per problem per student. Students participating in your study have completed 2,280 problems. The 
flow through a single problem incorporates many actions, resulting in a coarser data file (fewer rows). 

3. Student Level - One row per student; the coarsest granularity. Columns are laid out in opportunity order to depict the student’s 
progression through the problem set. Problem level information is expanded to one column per problem per field (column 
heavy). 

4. Student Level + Problem Level - One row per field per student. Columns are laid out in opportunity order to depict the student’s 
progression through the problem set. An alternative view of student level information (row heavy). 
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TECHNOLOGY-SUPPORTED RANDOMIZED 
CONTROLLED EXPERIMENTATION 


Through the ASSISTments TestBed, researchers are able to design mini- 
mally invasive RCEs within easily accessible and highly used educational 
content delivered by ASSISTments while receiving organized reports de- 
tailing student performance to streamline the analysis of learning inter- 
ventions. This type of open research environment is rare within learning 
technologies. The common use for RCEs or A/B testing within popular 
technologies is to optimize user experience or prolong user interaction. 
For instance, Google experiments with advertisement location to maxi- 
mize ad traffic without diminishing the user experience. Similarly, gaming 
application creators such as Zynga conduct A/B testing to optimize their 
games in a way that will retain users while promoting ad space. Although 
these approaches are consistent in marketing, few large-scale education 
platforms show an outward interest in examining learning interactions and 
optimizing learning gains. Massive open online course (MOOC) platforms 
and large-scale learning tools such as Coursera, EdX, Udacity, openHPI, 
and Google’s Course Builder focus on delivering content, while spending 
little time or money thoroughly examining the effects of what they deliver. 
This argument is not intended to suggest a complete lack of sound re- 
search but instead to point out that few researchers have access to course 
data from these platforms to improve user interfaces or curriculum deliv- 
ery. Even commercialized educational technologies lack open and easily 
accessible avenues for empirical research. For instance, the popular Khan 
Academy provides resources and support for select researchers to work 
through a process requiring substantial time and effort to understand the 
dynamics of the system. Creating and running an experiment within Khan 
Academy requires knowledge of the platform’s open-source code, the cod- 
ing skills necessary to make modifications to implement experimentation, 
and progression through a standard code-review process working alongside 
Khan Academy developers. Obtaining data files following an experiment 
is also heavily reliant on system programmers. To our knowledge, none of 
the A/B experiments that researchers have patiently conducted on Khan 
Academy have been formally published (see, e.g., Williams & Williams, 
2013; Williams, Paunesku, Haley, & Sohl-Dickstein, 2013). Instead, work 
with less regard for improving specific interventions has evaluated the plat- 
form’s efficacy in schools (Murphy, Gallagher, Krumm, Mislevy, & Hafter, 
2014) and prediction models for large-scale but secondary data (Piech et 
al., 2015). Such major platforms should be reframed with a focus on open 
educational research at scale or should at least support the open collection 
of anonymized data through APIs to inform EdTech policy. 
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The application of stringent research methodologies to improve learn- 
ing technologies and educational outcomes is severely lagging. This defi- 
cit is what makes the ASSISTments TestBed so unique. The TestBed guides 
researchers through the process of running practical RCEs by leveraging 
ASSISTments’ content and user population. There are currently over 
130 RCEs running within the ASSISTments TestBed. These studies are 
directed at solving practical problems within education and understand- 
ing best practices within technology-driven learning. While these studies 
help researchers to identify evidence-based instructional improvements, 
findings also lead to the generation of new hypotheses that expand investi- 
gation or reroute postulated theories. Results from a single study may gen- 
erate four new hypotheses, with the potential for exponential expansion 
as a line of research evolves. The results of these studies can also benefit 
ASSISTments: Findings regarding best practices continuously improve the 
system’s content and delivery while pinpointing areas for broad change 
through infrastructure improvements. Thus, a collaborative and open re- 
search infrastructure supports perpetual evolution on a small scale within 
the system and on a large scale across research communities. 


DEVELOPING COLLABORATIVES AROUND SHARED 
SCIENTIFIC TOOLS 


To get the most out of educational technologies, learning platforms 
must be revolutionized into shared scientific instruments. Through the 
ASSISTments TestBed, ASSISTments is attempting to initiate this move- 
ment by stepping forward as the Hubble telescope of learning science. 
Unlike a static piece of equipment, the platform can be used to run mul- 
tiple experiments simultaneously, and researchers are able to improve the 
instrument for others through their experiences. Through this collabora- 
tive approach, as shown in Figure 5, researchers bring many ideas and 
hypotheses to the TestBed. Some of the studies designed around these 
hypotheses result in reliably positive effects, whereas others are extended 
to form stronger research questions. Through this process, researchers al- 
ter and enhance content and feedback within ASSISTments. Students and 
teachers benefit from stronger content while researchers expand their 
fields through refereed publications. 

Realization of the platform’s value as a shared scientific tool has en- 
couraged research at scale from universities including Boston College; 
Freiburg University; Harvard University; Indiana University; Northwestern 
University; Southern Methodist University; Texas A&M; University of 
Colorado Colorado Springs; University of California, Berkeley; University 
of Maine; University of Wisconsin; and Vanderbilt University. Since its 


TCR, 119, 030306 Tomorrow’s EdTech Today 


AXVAT[IG/UI}U0'D 
s}usUl LSISSV 


pogisal, SHUI TSTISSV 


7, 


‘sossaisoid yom sev AJOATap JUI}UOI SuTMIyISUINs pure yU9]009 


wajsks Supuryua ‘ast Keur sosoyjoddy Jo suonesayt afdyny, ‘suonesyqnd pomotsAes-120d pur ‘saut0sjno Surusesy] 
yUSpN}s 0} syUSWIADURYUA ‘saan IVAd jsaq JO BSpa;MOUY 0} spo] pogisa], SJUIULLSTSSV 2U} UNIM YIwasey *G amMSLy 


13 


Teachers College Record, 119, 030306 (2017) 


inception, interest in the TestBed has continued to expand through a 
kickoff webinar, an AERA seminar, and well-documented support for re- 
searchers made possible by NSF funding (Heffernan & Williams, 2014). 

By articulating specific challenges for improving K-12 mathematics edu- 
cation to a broad and multidisciplinary community of psychology, educa- 
tion, and computer science researchers, this funding allows researchers to 
collaboratively (and perhaps competitively) propose and conduct RCEs at 
an unprecedentedly precise level and large scale. The following list high- 
lights the broad spectrum of work that researchers have shown interest in 
examining further within the TestBed: 


Types of feedback 


e Immediate versus delayed feedback (Fyfe, Rittle-Johnson, & DeCaro, 
2012) 


¢ Comparing the types of hints provided adaptively to learners 
(Stamper, Eagle, Barnes, & Croy, 2013) 


e Comparing levels of feedback, from guided to open (Sweller, 
Kirschner, & Clark, 2007) 


¢ Comparing “what you see is what you get” with interaction (Keehner, 
Hegarty, Cohen, Khooshabeh, & Montello, 2008) 


e¢ Prompting for comparison of analogous problems and worked ex- 
amples (Jee et al., 2013) 


Sequencing and spacing 


e Changing schedules and procedures for practice sessions and quiz- 
zes (Roediger & Karpicke, 2006) 


e Testing the effectiveness of pretesting prior to instruction (Richland, 
Kornell, & Kao, 2009) 


e Spacing skill content (Pashler, Rohrer, Cepeda, & Carpenter, 2007) 
e Examining testing effects (Butler & Roediger, 2007) 
Self-regulated learning and metacognition 


e Testing interventions to increase motivation and teach strategies 
(Ehrlinger & Shain, 2014) 


e Examining how task framing changes what students learn (Belenky 
& Nokes-Malach, 2013) 


e Examining metacognitive scaffolding provided in problem-solving 
(Roll, Holmes, Day, & Bonn, 2012) 


e Testing the value of free recall (Arnold & McDermott, 2013) 
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Social context and interaction 


e Adapting instructional materials to students’ personal & peer inter 
ests (Walkington, 2013) 


e Embedding software and dynamics for peer assistance (Walker, 
Rummel, & Koedinger, 2011) 


e Examining how confidence affects performance in early algebra 
(Mazzocco, Murphy, Brown, Rinne, & Herold, 2013) 


Assessment 


e Examining computational models used to diagnose learner state 
(Rafferty & Griffiths, 2014) 


e Examining computational methods for assessing affective states 
(Ocumpaugh, Baker, Gowda, Heffernan, & Heffernan, 2014) 


e Examining forgetting (Storm, Bjork, Bjork, & Nestojko, 2006) 
Motivation 


e Embedding motivational videos from teachers (Kelly, Heffernan, 
D’Mello, Namias, & Strain, 2013) 


¢ Incorporating messages to foster growth mindset (Williams, 2013) 


e Examining the effects of goal-setting (Bernacki, Byrnes, & Cromley, 
2012) 


e Examining the effects of student choice (Chernyak & Kushnir, 
2013) 


e Inserting quizzes and tests to maintain and guide student focus 
(Szpunar, Khan, & Schacter, 2013) 


Mathematics education 


¢ Comparing representational formats in supporting mathematics 
learning (Rau, Aleven, Rummel, & Rohrbach, 2012) 


e Investigating effective presentations of worked examples in math- 
ematics (Booth, Lange, Koedinger, & Newton, 2013) 


e Examining strategies for learning fractions (Cordes, Williams, & 
Meck, 2007) 


e Testing images of manipulatives versus virtual manipulatives 
(Mendiburo, Sulcer, Biswas, & Hasselbring, 2012) 


By building these types of collaborative scientific tools, the cost of fund- 
ing educational research could be drastically reduced. For instance, the 
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Institute of Education Sciences (IES) currently funds efficacy trials for 
promising interventions that cost an average of $3 million and can involve 
more than 50 schools. Larger and more stringent effectiveness trials carry 
a median cost of $6 million. In the math and science domains, the IES 
has funded 22 efficacy trials and five effectiveness trials. Despite the high 
cost of funding this work, reliable positive implications for educational 
practice are rarely observed. Using adaptive technologies geared toward 
research, large-scale trials could be expedited at a fraction of the cost. The 
IES funding pipeline (IES, 2015) and the ASSISTments TestBed equiva- 
lent are depicted in Figure 6. Studies that were once restricted by the avail- 
ability of funding could be considered through learning technologies. 
Much of the efficacy attained through use of the TestBed is due to student- 
level randomization (rather than traditional class- or school-level randomiza- 
tion), allowing experiments to be conducted within classrooms rather than 
across classrooms. This accrues drastically larger samples, increasing the 
power of analyses in order to better detect the reliable effects of interven- 
tions. The unique ability for student-level randomization, coupled with the 
scalability inherent in manipulating prebuilt content of interest to a large 
user base, allows in vivo educational research to gain the minimally invasive 
A/B flavor often used in marketing. Studies within the TestBed also align 
with typical educational practice (i.e., students are never intentionally dis- 
advantaged by a study design). This approach allows students to access and 
complete assignments, often without awareness that they are participating 
in research. Teachers are made aware of experimentation through a con- 
ventional assignment-naming procedure that tags experiments with “Ex.” As 
data dissemination is carefully preprocessed to protect students’ identities 
and students receive assignments that are within the definition of normal 
instructional practice, this passive approach to research is IRB-approved. 
While low-cost procedures may not hold for all educational investiga- 
tions (e.g., the design of full learning programs or platforms that require 
significant funding), there are many benefits to cost-effective, efficient, 
and rigorous experimentation that can be conducted using educational 
technologies. Many unique features make ASSISTments capable of serv- 
ing researchers as a shared scientific tool. However, ASSISTments is not 
the only platform with the power to drive a collaborative like the TestBed. 
The majority of learning applications have the capacity for data collec- 
tion, and many could be restructured to offer the flexibility required for 
experimental content manipulation. Other platforms may also be capable 
of establishing APIs to deliver preprocessed data, anonymized for student 
protection, to researchers conducting RCEs or even wishing to mine data. 
With similar research-based platforms in the field, it would also be possi- 
ble for researchers to compare learning interventions across platforms to 
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better measure the reliability and generalizability of results. Collaborative 
research goals that crosscut platforms may finally usher in the tipping 
point of educational technologies (Bush & Mott, 2009; Gladwell, 2002) 
as researchers grow to understand What works best? For whom? When? 


COLLABORATIVE RESEARCH AT SCALE OFFERS 
PERPETUAL BENEFITS 


The power of the ASSISTments TestBed as a collaborative research tool did 
not come about overnight. As a learning platform, ASSISTments has piv- 
oted numerous times in the past decade (Heffernan & Heffernan, 2014). 
The steady improvements of the TestBed were largely driven by the results 
of pilot studies within the system. This growth and adaptation exemplifies 
perpetual evolution. Essentially, a simple hypothesis acts as the seed for an 
expansion of research that germinates through related ideas, eventually 
pushing the limits of the system until infrastructure improvements must be 
made to accommodate further questions—a cycle depicted in Figure 7. As 
the cycle begins, researchers form novel hypotheses that compare manipu- 
lations within the platform to best (known) practices—either comparable 
traditional classroom practices or previous versions of the platform’s mate- 
rial. Early results inspire collaborative idea expansion through replications 
and extensions of studies that serve to enhance system content and con- 
tent delivery while improving student learning and advancing the state of 
knowledge in the field through peer-reviewed publications. New hypotheses 
form and grow as results are observed, naturally evolving until they push 
the boundaries of the platform’s infrastructure. In response, scientifically 
validated infrastructure improvements can be tailored to research demand, 
forming the final stage of this cycle. New system features, a mark of evolu- 
tion, allow researchers to start the cycle anew with novel hypotheses. 
Ever-expanding progress is a core concept for effectively marketing com- 
mercial products, but it is far less common in education. Education is a 
difficult rock to move, with teachers and administrators holding tight to 
traditional methods, and pushing back against the changes brought about 
by modern technologies (Bush & Mott, 2009). It is hardly surprising that 
most educational technologies lack collaborative research infrastructures. 
Administrators have not been focused on examining the effectiveness of new 
instructional strategies made possible by these platforms because most plat- 
forms have instead been tailored to simplify traditional teaching methods 
(Bush & Mott, 2009). As educators continue to grow more open to the pos- 
sibilities of learning technologies, the value of collaborative research at scale 
will escalate. By establishing research environments like the TestBed, cre- 
ators and users of educational technologies will learn of the unprecedented 
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Figure 7. The cycle of perpetual evolution that stems from use of an 
educational platform as a collaborative research tool. An initial hypothesis 
comparing new methods to best (known) practices grows into a series of 
ideas that improve system content while benefiting students and advancing 
knowledge in the field. These ideas continue to grow until limited by the 
platform’s capabilities. Infrastructure improvements validated by previous 
findings and inspired by research demand can then be made to return the 
cycle to a fresh starting point, where new hypotheses can be formed. 


The Seed 
Comparing Research Generated 
Content to Best (Known) Practices 


Collaborative Hypothesis Growth 
Enhancing System Content, Improving 
Learning, Advancing Science 


Infrastructure Improvements 
Research-Based Platform Evolution 


benefits made possible by the cycle of perpetual evolution. The following 
sections step through this cycle, defining exemplary research at each stage 
as conducted within ASSISTments and the ASSISTments TestBed. 


THE SEED: COMPARING RESEARCH-GENERATED CONTENT TO BEST 
(KNOWN) PRACTICES 


Kelly, Heffernan, Heffernan, et al. (2013) used ASSISTments to compare 
traditional mathematics homework (with delayed, next-day feedback) 
to the same assignment featuring immediate correctness feedback. All 
students participating in this RCE used ASSISTments to complete their 
homework, with feedback settings differing between randomly assigned 
conditions. The research design included 20 questions delivered using 
skill triplets (i.e., three similar skill problems presented consecutively) to 
determine the effectiveness of correctness feedback. Students in the con- 
trol condition did not receive feedback while completing their homework, 
as shown in Figure 8. Blue dots within the left menu show completed prob- 
lems. The next day in class, the teacher reviewed the homework without 
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using ASSISTments reports and simply read answers aloud as students cor- 
rected their work. The teacher then worked through requested problems 
on the board. Students in the experimental condition received immedi- 
ate correctness feedback while completing their homework, as shown in 
Figure 9. The next day in class, the teacher used data from the item report 
to determine which problems to focus on during the homework review, 
with an emphasis on common wrong answers shared by multiple students. 


Figure 8. Control condition as experienced by the student (Kelly, 
Heffernan, Heffernan, et al., 2013): Students were not told whether 
their answers were correct or incorrect, an approach that mirrors 
traditional homework. This study implemented problem triplets, or sets 
of three questions per skill, providing multiple opportunities to display 
skill knowledge. 


Assignment: 4.6 (Negative Exponents TEST) 


Problem ID: PRAHE5Z Comment on this problem 
Find the product. Write your answer using 
only positive exponents. (5°8)(59) 


Write the expres... 
Write the expres... 
Write the expres... 
Find the product... 
(Find the product... 


Type your answer below: 
25°3 


@ Answer recorded. Click "Next Problem” 


Submit Answer Next Problem | 


Analysis of 63 students suggested reliable improvements in student 
learning through the addition of correctness feedback. Students in the 
control group showed an average gain of 59% from pretest to post-test (an 
effect size of 0.52), whereas students in the experimental group showed 
an average gain of 74% (an effect size of 0.56). It should be noted that 
Cohen’s rule of thumb for interpreting effect sizes has been somewhat 
discredited as a measure for benchmarking the practical significance of 
effects, especially when working with researcher-defined measures (Lipsey 
et al., 2012). Instead, it is recommended that researchers compare growth 
attributed to an intervention to normative expectations. Comparing gains 
across conditions, this method suggests a reliable 15% increase in average 
learning gains. It is also possible to benchmark these findings against the 
results of similar studies, which have a mean effect size of 0.43 (Lipsey 
et al., 2012), showing the clear strength of providing immediate correct- 
ness feedback as an intervention. Kehrer, Kelly, and Heffernan (2013) 


20 


TCR, 119, 030306 Tomorrow’s EdTech Today 


Figure 9. Experimental condition as experienced by the student (Kelly, 
Heffernan, Heffernan, et al., 2013): Students were provided immediate 
correctness feedback as they responded to each problem. The student in 
this example was able to self-correct and progress through the first skill 
triplet but struggled with the second. 


Assignment: 4.6 (Negative Exponents) 


Write the expres... $¢ Problem ID: PRAHE5SZ Comment on this problem 
oe ae Find the product. Write your answer using 
olga ie arvans only positive exponents. (5°8)(5>) 
Find the product... $¢ 
(Find the product... 3¢ The answer is 1/5*3. 
Comment on this hint 


Type your answer below: 


[1/5°3 


Submit Answer | Show hint 1 of 1} 


replicated the positive effects of immediate correctness feedback observed 
in Kelly, Heffernan, Heffernan, et al.’s original work (2013). 

Similar hypotheses examining the efficacy of feedback within 
ASSISTments have led to numerous publications over the past decade. 
Mendicino et al. (2009) examined the effectiveness of mathematics 
homework with scaffolded tutoring in comparison to traditional paper- 
and-pencil homework. Students who received adaptive scaffolding showed 
significant learning gains over those following traditional homework 
procedures. Razzaq, Heffernan, and Lindeman (2007) suggested that 
adaptive scaffolding led to greater learning gains than on-demand hints. 
Researchers observed an interaction between students’ proficiency levels 
and the effectiveness of feedback styles, with less-proficient students ben- 
efiting from scaffolding and more-proficient students benefiting from hints. 
Follow-up studies confirmed that on-demand hints produced more reliable 
and robust learning in highly proficient students (Razzaq & Heffernan, 
2010). Singh et al. (2011) then compared correctness feedback with on- 
demand hints. Multiple trials consistently revealed that hint feedback led to 
significantly improved learning over correctness feedback alone. Research 
has also examined the content presented within feedback, through compar- 
isons of worked examples and scaffolded problem-solving (R. Kim, Weitz, 
Heffernan, & Krach, 2009; Shrestha et al., 2009) and investigations of moti- 
vational feedback (Kelly, Heffernan, D’Mello, et al., 2013; Ostrow, Schultz, 
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& Arroyo, 2014). Results suggesting the consistent benefits of feedback have 
allowed researchers working within ASSISTments to expand their questions 
from seeds (Does immediate feedback help?) to more detailed investiga- 
tions (What type of immediate feedback is most effective?). 


COLLABORATIVE HYPOTHESIS GROWTH: ENHANCING SYSTEM 
CONTENT, IMPROVING LEARNING, ADVANCING SCIENCE 


Ostrow and Heffernan (2014) expanded on the “feedback is good” hy- 
pothesis to examine the effectiveness of various feedback mediums. Prior 
to this study, ASSISTments delivered feedback via text, altering color and 
typeface to draw students’ attention to significant variables and themes. 
This RCE pushed that boundary to compare learning outcomes when 
identical feedback was delivered using short video snippets. Outcomes of 
student performance and response time were measured across six prob- 
lems pertaining to the Pythagorean theorem. All students received the 
same six questions in mixed orders, receiving three opportunities for text 
feedback and three opportunities for video feedback over the course of 
the assignment. As shown in Figure 10, feedback was matched across medi- 
um; videos consisted of a researcher working through each feedback step 
while referencing images on a whiteboard. Students received feedback 
through scaffolds, by either requesting assistance or answering a problem 
incorrectly. Learning gains on the second question were compared across 
students who received feedback on the first question. Following the prob- 
lem set, students were asked a series of survey questions to judge how they 
viewed the addition of video to their assignment. 

Results of an analysis of 89 students who completed the assignment and 
were able to access video content revealed that video feedback increased 
the likelihood of accuracy on the next problem. Students spent significantly 
longer consuming video feedback but answered their next question more 
efficiently. Assessing self-report measures, 86% of students found the vid- 
eos at least somewhat helpful, and 83% of students wanted video in future 
assignments (Ostrow & Heffernan, 2014). Based on these findings, teach- 
ers and researchers have been recruited to create video feedback for skill 
builder problems to expand the amount of video content available within 
the system and allow for further examination into the effects of video. The 
ease with which teachers and researchers are able to record short video mes- 
sages and upload them to the system suggests that this approach is a plau- 
sible avenue for crowdsourcing feedback (Howe, 2008; Kittur et al., 2013). 
Crowdsourcing and learnersourcing (J. Kim, 2015) feedback are future di- 
rections for the ASSISTments platform, as infrastructure improvements are 
required to optimally support, organize, and vet feedback collection at scale. 
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Many of the studies that best define collaborative hypothesis growth 
are currently under way within the TestBed, examining the effectiveness 
of particular types of feedback. Numerous researchers are investigating 
what drives the apparent effects of video feedback by comparing vari- 
ous types of videos (e.g., recorded human tutoring, a “pencast” problem 
walkthrough with audio explanation, and peer videos with tutoring led by 
other students). Many of these studies are pushing ASSISTments’ techno- 
logical boundaries, establishing a demand for specific infrastructure im- 
provements that will help the system and its content to evolve. 


INFRASTRUCTURE IMPROVEMENTS: RESEARCH-BASED PLATFORM 
EVOLUTION 


Research on the efficacy of feedback mediums laid the groundwork for 
debates about the possible impacts of allowing students to choose between 
mediums. Without any real capacity to provide choice, ASSISTments was 
reaching a tipping point for infrastructure improvement. A pilot study was 
conducted by taking advantage of bugs in the system to mock up student 
choice (Ostrow & Heffernan, 2015). This simple RCE examined interac- 
tions between student choice and feedback medium using a 2 x 2 factorial 
design, depicted in Figure 11. Two versions of a problem set on simple 
fraction multiplication were created, one incorporating text feedback and 
one incorporating video feedback. Short, 15- to 30-second video snippets 
were designed to be as comparable to text feedback as possible, in order 
to compare delivery medium. At the start of the assignment, students were 
randomly assigned to either the choice condition or the control condi- 
tion. Those assigned to the choice condition were asked what type of feed- 
back they wished to receive while working on their assignment, as shown 
in Figure 12, and were routed accordingly. Those assigned to the control 
were immediately reassigned to either video or text feedback. 

For a sample of 78 middle school students who completed this pilot, 
results suggested that feedback medium did not have a specific impact 
on learning gains within this context, contrary to results presented ear- 
lier on the efficacy of video feedback, suggesting that perhaps video is 
not effective for all age ranges or skill domains and beginning to answer 
What works best? For whom? When? However, students who were able 
to choose their feedback medium showed significant improvements over 
students who were randomly assigned a medium. Students with choice 
earned higher scores on average, used fewer hints and attempts, and per- 
sisted longer than those not provided choice. Perhaps the most interest- 
ing observation: Learning gains were higher in students who were pro- 
vided choice, regardless of whether or not the student actually ended up 
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Figure 11. Experimental design used to investigate student choice as a 
pilot study within ASSISTments (Ostrow & Heffernan, 2015); prior to 
this study, students were not able to exert control over their assignments 


within the platform. 


Random Assignment 


Choice 


No Choice 
(Experiment) 


(Control) 


Text Feedback 


Video Feedback 
(Preferred) 


(Preferred) 


Text Feedback 


(Random Assignment) 


Video Feedback 
(Random Assignment) 


Figure 12. Student preference prompt guiding medium routing for those 
in the experimental condition (Ostrow & Heffernan, 2015). 


Assignment: ReRoute 
Comment on this problem 


Problem ID: PRAXCHQ 


This problem set is a little bit different. We want to give you some say in how you learn! 


Would you prefer: 
Hints and feedback that use text to help you when you feel stuck. 


OR 
Hints and feedback that use short videos to help you when you feel stuck. 


Select your answer below. 


| prefer text feedback! 
| prefer wideo feedback! 


Seooret Ariweer 
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requesting feedback during the assignment (Ostrow & Heffernan, 2015). 
These results became the driving force for a significant infrastructure im- 
provement within the ASSISTments platform that would allow for condi- 
tional path routing. An If-Then routing structure was developed under 
the SI2 NSF grant (Heffernan & Williams, 2014) to extend research capa- 
bilities within ASSISTments and the ASSISTments TestBed. Hypotheses 
regarding student choice and other routing-based research can now be 
easily examined with greater validity and at scale. 

A replication of the choice pilot by Ostrow and Heffernan (2015) was 
designed using the ifthen routing structure, as shown in Figure 13. The 
inclusion of conditional path routing helped to enhance the internal va- 
lidity of video-based research by allowing sample populations to be refined 
to include only students with the technological capacity to view video con- 
tent. Although, in hindsight, this feature seems like an obvious require- 
ment for video-based research, it was not possible within ASSISTments 
prior to if-then routing. Thus, it is clear how this new feature has the po- 
tential to improve and expand research within the ASSISTments TestBed. 

An example ofhowa researcher might go about building an ASSISTments 
problem set with simple if-then routing is shown in Figure 14. The building 
process requires three elements: a conditional statement, a true path, and 
a false path (Ostrow & Heffernan, 2016). The conditional statement can 
include a problem or problem set, with an adjustable setting that guides 
path routing based on student performance as measured by completion or 
accuracy. If performance meets this preset threshold, the student is rout- 
ed into the true path, or the second section in Figure 14 (“Video chosen”). 
If performance does not meet this preset threshold, the student is routed 
into the false path, or the third section in Figure 14 (“Text chosen”). In 
this example, the conditional statement is a single preference question, 
much like that shown in Figure 12. Video feedback is set as the “correct” 
answer, routing students based on the then clause, while text feedback is 
set as the “incorrect” answer, routing students based on the else clause. 
Students receive this problem in test mode (i.e., without correctness feed- 
back, showing only a blue dot for completion), thereby restricting the in- 
ner workings of the routing system from student view and removing the 
risk of undue penalties for a “wrong” opinion. Numerous studies now run- 
ning within the ASSISTments TestBed implement if-then routing in some 
capacity (e.g., as technical validation, as adaptive performance routing, to 
trigger interventions for struggling students, or to buffer sampling within 
intent-to-treat studies seeking to help only students with low skill profi- 
ciency). This simple infrastructure improvement completes an iteration of 
the cycle of perpetual evolution, opening new avenues for fresh seed-level 
hypotheses to start the cycle anew. 
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Figure 14. Researcher’s view while constructing a study using if-then 
routing within an ASSISTments problem set; study design shown here 
mirrors that in Figure 13. 


Choice Condition ca: sme 


Choice Condition 


Problem Set Settings Video Chosen (Tru... 
Problem Set Type: | If-Then-Else $ Text Chosen (Fals... 
Variables 


‘The if- Then-Else requires you to consider three objects im the problems section. 


1. The “conditional part” can be @ problem set, @ skill bulider or a problem. It will return a 
true or faise result that will determine what happens next. 


part 
rs Tas “aloe part wat sendeats dof tha Condiondl fort sour te. 


Problem Set Correctness: 50 


ee ee rere aan sat to be martes comers Gees 
not apply to sullbuliders oF single problems 


# | wilt put three objects in the Problems section. 

Make the then part” empty. | wil enty put two objects inthe Problem 
section the "conditional part” and the “else part’ 

Make the “else part” empty. | will onty put two objects in the Problem 
Section the "conditional part” and the “then part’. 


Change all Problems in this Problem Set to: 
Tutor Mode © Test Mode 


“True” — Conditional Problem 
answered ‘Correctly’ 


Display this Problem Set as # it were a Skill Bulicer 
‘Sill Bulicer used in ARRS 


“False” — Conditional Problem 
answered ‘Incorrectly’ 


Video Chosen (True Path) Skill Builder - Random Order 
Text Chosen (False Path) Skill Builder - Random Order Ee Delete 
Aad existing problems 


FUTURE DIRECTIONS OF THE ASSISTMENTS PLATFORM 


It is difficult to advocate for a future consisting of research-infused ed- 
ucational technologies without touching briefly on future goals for the 
ASSISTments platform. With a focus on disseminating the ASSISTments 
TestBed and enhancing its validity as a collaborative tool for sound sci- 
ence, the cycle of perpetual evolution will bring about a number of sig- 
nificant infrastructure improvements for ASSISTments in the near future. 
Perhaps the most immediate change, as suggested by the research pre- 
sented herein, will be extending the platform to support teachersourced 
and learnersourced feedback. The platform has 25,000 vetted mathemat- 
ics problems that were created by WPI and Carnegie Mellon University. 
In addition, teachers have added over 100,000 problems to the platform, 
many of which already include some form of feedback. The first step to- 
ward crowdsourcing feedback for these problems is to allow teachers to 
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create tutoring strategies in support of content owned by others, rather 
than only in support of their own content. Differing teachers will offer 
differing solution approaches, which may help struggling students to see 
a problem from a different perspective. A select group of teachers and 
students have already recorded video feedback for use in a set of RCEs 
examining the potential benefits and obstacles of crowdsourcing feedback 
at scale. Eventually, this approach will be scaled to allow students to show 
their work and provide explanations for their peers through a tool called 
PeerASSIST (Heffernan et al., 2016). A task already appreciated by most 
mathematics teachers, showing work will help students to solidify their un- 
derstanding of the content while creating feedback to benefit other users 
(Kulkarni et al., 2013). The network effects inherent to teachersourcing 
and learnersourcing feedback will enhance system content at an impres- 
sive scale (Bush & Mott, 2009). 

The implementation of crowdsourcing will naturally give way to another 
goal for the future of ASSISTments: establishing an automated process 
to select optimal feedback using contextual k-armed bandits. This is an 
algorithmic approach, rooted in the theory of sequential design (Robbins, 
1952), to the exploration—exploitation tradeoff. Essentially, with a pool of 
content available to students (i.e., many types of feedback), it is necessary 
to repeatedly sample the efficacy of assigned content in order to maximize 
the delivery of effective content while minimizing the delivery of ineffec- 
tive content. The use of k-armed bandits will minimize detriment to stu- 
dents while allowing for the dynamic versioning of materials and setting 
the stage for personalized learning (i.e., algorithmically establishing What 
works best? For whom? When?). An important feature that will grow from 
the implementation of k-armed bandits will be the capacity to store user 
variables for lasting personalization. Variables such as initial performance, 
particular student responses, or specific student characteristics could help 
to optimize content and feedback delivery for each student, both within 
and across assignments. The ASSISTments team expects that these goals 
will strengthen the platform and inspire new avenues for scientific inquiry. 


IN CONCLUSION: INFUSE EDUCATIONAL TECHNOLOGIES WITH 
COLLABORATIVE RESEARCH TO PROMOTE SOUND SCIENCE 


Systemic change does not stem from a small number of large-scale RCEs 
funded by government grants, but instead from a revolution in thought 
surrounding the value of technology-based learning applications. As 
shown herein, infusing preexisting learning technologies with the ca- 
pability to support RCEs is the first step in kick-starting this revolution. 
From there, the platform can expand as a shared scientific tool used by 


Teachers College Record, 119, 030306 (2017) 


a community of researchers collaborating to better understand the effi- 
cacy of educational interventions. ASSISTments bridges practice and re- 
search by enabling researchers to work collaboratively with teachers and 
students and by providing unprecedented access to authentic learning 
environments and actionable classroom data. The collaborative nature 
of the ASSISTments TestBed gives way to a cycle of perpetual evolution 
that inspires continuous advancements to ASSISTments content while 
simultaneously advancing knowledge of best practices. Insights and in- 
novations drawn from research findings can be incorporated into the 
system itself as well as future research, with each successive step building 
upon previous contributions. 

Research-infused platforms have the potential to drive inquiry for a 
diverse community of researchers through the low-cost, rapid iteration 
of valid, generalizable, and noninvasive investigations within authentic 
learning environments. Systems like ASSISTments can provide research- 
ers with access to an extensive and diverse subject pool, an automated 
fine-grained logging of educational data, validated measures of stu- 
dent learning and affect, and automated data reporting and analysis to 
tackle the high-stakes nature of typical education research. With similar 
research-focused platforms in the field, it would also be possible for re- 
searchers to compare learning interventions across platforms to better 
measure the reliability and generalizability of results. These platforms of- 
fer a unique opportunity for the synergistic growth of research and policy 
detailing best practices in education. If these platforms grow to welcome 
collaborative research, educational technology will reach its long-awaited 
tipping point and begin to have a broad impact on the efficacy and valid- 
ity of research across domains. Tomorrow’s educational technology de- 
mands a revolution in today’s approaches to research at scale: pave the 
way for sound collaborative science, and the rest will follow. 
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